Cross-Validation, Shrinkage and Variable Selection in Linear Regression Revisited
نویسندگان
چکیده
In deriving a regression model analysts often have to use variable selection, despite of problems introduced by datadependent model building. Resampling approaches are proposed to handle some of the critical issues. In order to assess and compare several strategies, we will conduct a simulation study with 15 predictors and a complex correlation structure in the linear regression model. Using sample sizes of 100 and 400 and estimates of the residual variance corresponding to R of 0.50 and 0.71, we consider 4 scenarios with varying amount of information. We also consider two examples with 24 and 13 predictors, respectively. We will discuss the value of cross-validation, shrinkage and backward elimination (BE) with varying significance level. We will assess whether 2-step approaches using global or parameterwise shrinkage (PWSF) can improve selected models and will compare results to models derived with the LASSO procedure. Beside of MSE we will use model sparsity and further criteria for model assessment. The amount of information in the data has an influence on the selected models and the comparison of the procedures. None of the approaches was best in all scenarios. The performance of backward elimination with a suitably chosen significance level was not worse compared to the LASSO and BE models selected were much sparser, an important advantage for interpretation and transportability. Compared to global shrinkage, PWSF had better performance. Provided that the amount of information is not too small, we conclude that BE followed by PWSF is a suitable approach when variable selection is a key part of data analysis.
منابع مشابه
Improved Variable Selection with Forward - Lasso Adaptive Shrinkage
Recently, considerable interest has focused on variable selection methods in regression situations where the number of predictors, p, is large relative to the number of observations, n. Two commonly applied variable selection approaches are the Lasso, which computes highly shrunk regression coefficients, and Forward Selection, which uses no shrinkage. We propose a new approach, “Forward-Lasso A...
متن کاملCollaborative targeted learning using regression shrinkage.
Causal inference practitioners are routinely presented with the challenge of model selection and, in particular, reducing the size of the covariate set with the goal of improving estimation efficiency. Collaborative targeted minimum loss-based estimation (CTMLE) is a general framework for constructing doubly robust semiparametric causal estimators that data-adaptively limit model complexity in ...
متن کاملVariable Selection in Nonparametric and Semiparametric Regression Models
This chapter reviews the literature on variable selection in nonparametric and semiparametric regression models via shrinkage. We highlight recent developments on simultaneous variable selection and estimation through the methods of least absolute shrinkage and selection operator (Lasso), smoothly clipped absolute deviation (SCAD) or their variants, but restrict our attention to nonparametric a...
متن کاملEnsemble Kalman Filtering with Shrinkage Regression Techniques
The classical Ensemble Kalman Filter (EnKF) is known to underestimate the prediction uncertainty resulting from model overfitting and estimation error. This can potentially lead to low forecast precision and an ensemble collapsing into a single realisation. In this paper we present alternative EnKF updating schemes based on shrinkage methods known from multivariate linear regression. These meth...
متن کاملVariable Inclusion and Shrinkage Algorithms
The Lasso is a popular and computationally efficient procedure for automatically performing both variable selection and coefficient shrinkage on linear regression models. One limitation of the Lasso is that the same tuning parameter is used for both variable selection and shrinkage. As a result, it typically ends up selecting a model with too many variables to prevent over shrinkage of the regr...
متن کامل